scheduler: small csched_cpu_pick() adjustments
When csched_cpu_pick() decides to move a vCPU to a different pCPU, so
far in the vast majority of cases it selected the first core/thread of
the most idle socket/core. When there are many short executing
entities, this will generally lead to them not getting evenly
distributed (since primary cores/threads will be preferred), making
the need for subsequent migration more likely. Instead, candidate
cores/threads should get treated as symmetrically as possible, and
hence this changes the selection logic to cycle through all
candidates.
Further, since csched_cpu_pick() will never move a vCPU between
threads of the same core (and since the weights calculated for
individual threads of the same core are always identical), rather than
removing just the selected pCPU from the mask that still needs looking
at, all siblings of the chosen pCPU can be removed at once without
affecting the outcome.
Signed-off-by: Jan Beulich <jbeulich@novell.com>